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DETAILED ACTION 
Response to Amendment 

1 . Applicant has submitted an amendment filed 1/7/2005, amending the base claim 

1. while arguing to traverse the art rejection based on amended limitations regarding 
"the impact values being determinative of where inflection changes are to take place 
within the sequence of speech items" {see claim amendment and second paragraph on 
page 8 of the amendment). Applicant's arguments have been considered but are moot 
in view of the new ground(s) of rejection necessitated by claim amendment in view of 
Conkie (US 6173263). 

2. Page 3, lines 4-26 and pages 21-26 only disclose different pitch values, but do 
not include any known measurement unit associated with these pitch values. 
Therefore, previous ground of rejection of claims 9-1 5 1£ remain^. 

Allowable Subject Matter 

3. Claims 1 6-20 are allowed over prior art of record. 

4. Regarding claim 16, Coorman et al. disclose a method for converting text to 
concatenated voice by utilizing a digital voice library and a set of playback rules, the 
method further comprising: determining a syllable count for each speech item in the 
sequence of speech items (co/. 23, In, 35-45 and col. 23, In. 35-67): determining an 
impact value for each speech item in the sequence of speech items (toe COST 
FUNCTION sections on col. 12-15 explained in claim 1 in the response to argument 



Application/Control Number: 09/818,331 Page 3 

Art Unit: 2655 

section above)] determining a pitch value within a range for each speech item in the 
sequence of speech items by normalizing the impact value for the particular speech 
item (coL 13, In. 48-53); determining a desired inflection for each speech item in the 
sequence of speech items based on the syllable count and the pitch value for the 
particular speech item and further based on the set of playback rules (coL 23, In. 35-45 
and col. 23, In. 35-67 and the Cost Function in col. 12-15)] determining a sequence of 
voice recordings by determining a voice recording for each speech item based on the 
desired inflection for the particular speech item and based on the available voice 
recordings that correspond to the particular speech item (col. 9, In. 33-37)] and 
generating voice data based on the sequence of voice recordings by concatenating 
adjacent recordings in the sequence of voice recordings (col. 9, In. 51-56). Coorman et 
al. fail to specifically disclose the method wherein the playback rules dictate that the 
desired inflection for a glue item is based on the desired inflection for surrounding 
payload items and that the desired inflection for a payload item is based on the desired 
inflection for nearest payload items with priority being given to speech items having a 
greater pitch value such that the desired inflections are determined first for speech 
items having the greatest pitch value and, thereafter, are determined for speech items in 
order of descending pitch. Furthermore, it would have not been obvious to one of 
ordinary skill in the art at the time of invention to modify Coorman et al. by incorporating 
the teaching above. Therefore, claims 16-20 are allowed over prior art of record. 

Any comments considered necessary by applicant must be submitted no later 
than the payment of the issue fee and, to avoid processing delays, should preferably 
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accompany the issue fee. Such submissions should be clearly labeled "Comments on 
Statement of Reasons for Allowance." 



Claim Rejections - 35 USC § 112 

5. The following is a quotation of the first paragraph of 35 U.S.C. 112: 

The specification shall contain a written description of the invention, and of the manner and process of 
making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the 
art to which it pertains, or with which it is most nearly connected, to make and use the same and shall 
set forth the best mode contemplated by the inventor of carrying out his invention. 



6. Claims 9-15 are rejected under 35 U.S.C. 112, first paragraph, as failing to 
comply with the enablement requirement. The claim(s) contains subject matter which 
was not described in the specification in such a way as to enable one skilled in the art to 
which it pertains, or with which it is most nearly connected, to make and/or use the 
invention. The specification discloses the pitch value between 1 and 5, but fails to 
indicate measurement unit associated with these values to enable one to understand. 



Claim Rejections - 35 USC § 103 

7. The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set 
forth in section 102 of this title, if the differences between the subject matter sought to be patented and 
the prior art are such that the subject matter as a whole would have been obvious at the time the 
invention was made to a person having ordinary skill in the art to which said subject matter pertains. 
Patentability shall not be negatived by the manner in which the invention was made. 
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8. Claims 1 and 8 are rejected under 35 U.S.C. 103(a) as being unpatentable over 
Coorman et al. (US 6665641) in view of Conkie (US 6173263). 

9. Regarding claim 1 , Coorman et al. disclose a method for converting text to 
concatenated voice by utilizing a digital voice library and a set of playback rules (col. 8, 
In. 59 to col. 9, In. 56), the digital voice library including a plurality of speech items and a 
corresponding plurality of voice recordings wherein each speech item corresponds to at 
least one available voice recording wherein multiple voice recordings that correspond to 
a single speech item represent various inflections of that single speech item (col. 9, In. 
1-8), the method including receiving text data, converting the text data into a sequence 
of speech items in accordance with the digital voice library (col. 9, In. 13-25), the 
method further comprising: determining a syllable count for each speech item in the 
sequence of speech items (col. 23, In. 35-45)\ determining a cost in concatenating two 
unit candidates together by calculating a mismatch between a pitch at the right-hand 
edge of the right speech unit candidate and the pitch at the left-hand edge of the right 
speech unit candidate (col. 12, lines 40-67)', the cost function being determinative of 
how well speech unit candidates fit together in a sequence representative of 
synthesized speech (col. 11, line 41 to col. 12, line 67); and generating voice data 
based on the sequence of voice recordings by concatenating adjacent recordings in the 
sequence of voice recordings (col. 9, In. 51-56). 

Coorman et al. fail to specifically disclose the steps of determining an impact 
value for each speech item in the sequence of speech items, the impact values being 
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determinative of where inflection changes are to take place within the sequence of 
speech items; determining a desired inflection for each speech item in the sequence of 
speech items based on the syllable count and the impact value for the particular speech 
item and further based on the set of playback rules; and determining a sequence of 
voice recordings by determining a voice recording for each speech item based on the 
desired inflection for the particular speech item and based on the available voice 
recordings that correspond to the particular speech item. 

However, Conkie teaches the step of determining and assigning patterns of 
timing and intonation to the phonetic segment strings generated by the word 
pronunciation module (col. 4, line 31 to col. 5, line 17, particularly col. 5, lines 5-17, 
word pronunciation module 330 in figure 3 generates phonetic segment strings from the 
input text. This module assign annotation or inflection to a plurality of words in the input 
text specifying the speech synthesizer to search for speech unit candidates matching 
the assigned annotation or inflection for each word in the input text). 

Since Coorman et al. and Conkie are analogous art because they are from the 
same field of endeavor, it would have been obvious to one of ordinary skill in the art at 
the time of invention to modify Coorman et al. by incorporating the teaching of Conkie et 
al. in order to produce human-like synthetic speech. 

10. Regarding claim 8, Coorman et al. further disclose that a plurality of speech 
items includes a plurality of words, the method further comprising: determining a pitch 
value for each speech item in the sequence of speech items by normalizing the impact 
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value for the particular speech item (col. 10, In. 49-55 of col. 13, In. 48-53), wherein the 
desired inflection for each speech item is further based on the pitch value for the 
particular speech item (col. 10, In. 43-53). 

1 1 . Clainrrls rejected under 35 U.S.C. 103(a) as being unpatentable over Coorman et 
al. (US 6665641) in view of Conkie (US 6173263), and further in view of Jacks et al. 
(US 4692941). 

12. Regarding claim 2, Coorman et al. fail to specifically disclose that the speech 
items are glue items and a plurality of the speech items are payload items, the method 
further comprising: setting a flag for any speech item in the sequence of speech items 
that is a glue item, wherein the playback rules dictate that the desired inflection for a 
glue item is based on the desired inflection for surrounding payload items in the 
sequence of speech items and that the desired inflection for a payload item is based on 
the desired inflection for nearest payload items in the sequence of speech items. 

However, Jacks teach a method of setting a flag for any speech item in the 
sequence of speech items that is a glue item (col. 4, In. 48-50, the main point is to 
identify glue words), wherein the playback rules dictate that the desired inflection for a 
glue item is based on the desired inflection for surrounding payload items in the 
sequence of speech items and that the desired inflection for a payload item is based on 
the desired inflection for nearest payload items in the sequence of speech items (col. 9, 
In. 51 to col. 10, 27).The advantage of using the teaching of Jacks et al. in Coorman et 
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al. is to analyze the structure of the sentence and assign appropriate prosody to each 
word to make the synthesized speech sound more naturally. 

Since the modified Coorman et al. and Jacks et al. are analogous art because 
they are from the same field of endeavors, it would have been obvious to one of 
ordinary skill in the art at the time the invention was made to modify Coorman et al. by 
incorporating the teaching of Jacks et al. in order to analyze the structure of the 
sentence and assign appropriate prosody to each word to make the synthesized speech 
sound more natural^. 

13. Claims 3-5 are rejected under 35 U.S.C. 103(a) as being unpatentable over 
Coorman et al. (US 6665641) in view of Conkie (US 6173263), further in view of Jacks 
et al. (US 4692941), and further in view of Minowa et al. (US 6438522). 

14. Regarding claim 3, the modified Coorman et al. fail to specifically disclose that a 
plurality of speech items includes a plurality of phrases. However, Minowa et al. teach 
that a plurality of speech items includes a plurality of phrases (col. 7, In. 6-10). The 
advantage of using the teaching of Minowa et al. in the modified Coorman et al. is to 
allow the system to process phrase input speech items. 

Since the modified Coorman et al. and Gasper et al. are analogous art because 
they are from the same field of endeavors, it would have been obvious to one of 
ordinary skill in the art at the time the invention was made to further modify Coorman et 
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al. by incorporating the teaching of Gasper et al. in order to allow the system to process 
phrase input speech items. 

15. Regarding claim 4, the modified Coorman et al. fail to specifically disclose that a 
plurality of speech items includes a plurality of phrases. However, Minowa et al. teach 
that a plurality of speech items includes a plurality of words (col. 7, In. 6-10). The 
advantage of using the teaching of Minowa et al. in the modified Coorman et al. is to 
allow the system to process single word input speech items. 

Since the modified Coorman et al. and Gasper et al. are analogous art because 
they are from the same field of endeavors, it would have been obvious to one of 
ordinary skill in the art at the time the invention was made to further modify Coorman et 
al. by incorporating the teaching of Gasper et al. in order to allow the system to process 
single word input speech items. 

16. Regarding claim 5, the modified Coorman et al. fail to specifically disclose that a 
plurality of speech items includes a plurality of syllables. However, Minowa et al. teach 
that a plurality of speech items includes a plurality of syllables (col. 7, In. 10-25). The 
advantage of using the teaching of Gasper et al. in the modified Coorman et al. is to 
increase processing speed by using syllable-based segmentation scheme to reduce the 
number of speech models. 

Since the modified Coorman et al. and Minowa et al. are analogous art because 
they are from the same field of endeavors, it would have been obvious to one of 
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ordinary skill in the art at the time the invention was made to further modify Coorman et 
al. by incorporating the teaching of Minowa et al. in order to increase processing speed 
by using syllable-based segmentation scheme to reduce the number of speech models. 

17. Claim 6 is rejected under 35 U.S.C. 103(a) as being unpatentable over Coorman 
et al. (US 6665641) in view of Conkie (US 6173263), and further in view of Gasper et al. 
(US 5278943). 

18. Regarding claim 6, Coorman et al. fail to specifically disclose that multiple voice 
recordings that correspond to a single speech item represent various inflections of that 
single speech item and wherein the various inflections belong to various inflection 
groups including a at least one standard inflection group, at least one emphatic 
inflection group, and at least one question inflection group. However, Gasper et al. 
suggest that stored recordings having different prosodic environments (col. 13, In. 18- 
29). Thus, it would have been obvious to one of ordinary skill in the art at the time the 
invention was made to modify Coorman et al. by specifically making records of these 
different inflections to provide the digital library a wide range of speech variations of 
particular words to enhance speech synthesis capabilities and increase system's 
reliabilities. 
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19. Claim 7 is rejected under 35 U.S.C. 103(a) as being unpatentable over Coorman 
et al. (US 6665641) in view of Conkie (US 6173263), further in view of Gasper et al. (US 
5278943) and further in view of Jacks et al. (US 4692941). 

20. Regarding claim 7, the modified Coorman et al. fail to specifically disclose that at 
least one question inflection group includes a single word question inflection group and 
a multiple word question inflection group. However, Jacks et al. teach that at least one 
question inflection group includes a single word question inflection group and a multiple 
word question inflection group (col. 9, In. 45-50). The advantage of using the teaching 
of Jacks et al. in Coorman et al. is to assign appropriate pitch to word(s) in a question to 
make the speech sound more natural^. 

Since the modified Coorman et al. and Jacks et al. are analogous art because 
they are from the same field of endeavors, it would have been obvious to one of 
ordinary skill in the art at the time the invention was made to further modify Coorman et 
al. by incorporating the teaching of Jacks et al. in order to assign appropriate pitch to 
word(s) in a question to make the speech sound more natural^. 

Conclusion 

Applicant's amendment necessitated the new ground(s) of rejection presented in 
this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP 
§ 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 
CFR 1.136(a). 
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A shortened statutory period for reply to this final action is set to expire THREE 
MONTHS from the mailing date of this action. In the event a first reply is filed within 
TWO MONTHS of the mailing date of this final action and the advisory action is not 
mailed until after the end of the THREE-MONTH shortened statutory period, then the 
shortened statutory period will expire on the date the advisory action is mailed, and any 
extension fee pursuant to 37 CFR 1 .136(a) will be calculated from the mailing date of 
the advisory action. In no event, however, will the statutory period for reply expire later 
than SIX MONTHS from the date of this final action. 

Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to Huyen Vo whose telephone number is 703-305-8665. 
The examiner can normally be reached on M-F, 9-5:30. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Doris To can be reached on 703-305-4827. The fax phone number for the 
organization where this application or proceeding is assigned is 703-872-9306. 

Information regarding the status of an application may be obtained from the 
Patent Application Information Retrieval (PAIR) system. Status information for 
published applications may be obtained from either Private PAIR or Public PAIR. 
Status information for unpublished applications is available through Private PAIR only. 
For more information about the PAIR system, see http://pair-direct.uspto.gov. Should 
you have questions on access to the Private PAIR system, contact the Electronic 
Business Center (EBC) at 866-217-9197 (toll-free). 

HXV 5/19/2005 




SUSAN MCPADDEN 
PRIMARY EXAMINER 



